Overview

What is a model?

A model is a mathematical relationship that comes with a story. Stokey and Zeckhauser (1978) give a definition: “A model is a simplified representation of some aspect of the real world, sometimes of an object, sometimes of a situation or a process”.

A good model reduces a complex situation to a set of essential mechanisms, or dynamics, that an analyst needs in order to make a good decision.

A bad model mis-characterizes the mechanism of interest, is too simple to capture important dynamics, or is too complicated to be calibrated or understood.

More: Otto and Day (2011), Frigg and Hartmann (2012), Basu and Andrews (2013), Heesterbeek et al. (2015)

Parsimony and complexity: models of DNA

All models are wrong, but some are useful. – George Box

A good model is suited to a particular problem, and balances parsimony and realism, simplicity and complexity.

Models are used to answer hard questions

Sometimes the corresponding empirical study may be infeasible or unethical to conduct in real life.

For example,

What would happen if every injection drug user had access to naloxone? How many fatal overdoses would be averted? Is this intervention program cost-effective?

What would happen if the government eliminated funding for smoking cessation programs?

Modeling and scientific hypotheses

Models formalize scientific hypotheses about the mechanism that produces a phenomenon of interest.

When data agree with our model, then we may accumulate evidence that the hypotheses underlying the model are reasonable.

When we observe data that do not agree with the predictions of our model, then this might be evidence that our hypotheses are wrong.

Examples

HIV in the US

Krebs et al. (2019)

Counting injection drug users using networks

Crawford, Wu, and Heimer (2018)

Rotavirus

Pitzer et al. (2012)

Typhoid

Bilcke et al. (2019)

Bioterror: anthrax

Wein, Craft, and Kaplan (2003)

Bioterror: smallpox

Kaplan, Craft, and Wein (2002)

Syringe exchange

Kaplan (1995)

Toilets

Gonsalves, Kaplan, and Paltiel (2015)

How to do it

A recipe for modeling

  1. State the thing you want to learn about
  2. State what you know
  3. Make some assumptions linking what you know to the thing you want to learn about
  4. Use the model output to learn about the thing you want to learn about
  5. Evaluate the model and its sensitivity to assumptions

Stokey and Zeckhauser (1978), Vynnycky and White (2010), Otto and Day (2011)

A recipe for modeling

Example: how many people inject drugs?

Question: How many people inject drugs (e.g. opioids) in my city?

Data: counts of \(m\) individuals’ emergency room visits for overdose, \(X_1,\ldots,X_m\), all positive, for one unit of time (e.g. year). We only see \(X_i\) if person \(i\) had at least one overdose.

Why is this a hard problem? It seems that we do not have enough data to learn what we want to know!

Let’s illustrate a modeling-based solution using a stylized depiction of the overdoses. This is a preview of the mindset and recipe for modeling that you will learn about in this course.

Example: following the recipe (part 1)

  1. State the thing you want to learn about

Let \(N\) be the number of people who inject drugs in the city.

  1. State what you know

Let \(X_1,\ldots,X_N\) be the number of times (possibly zero) each has overdosed and been taken to the emergency room, in one year.

Let \(M=m\) the number who have had at least one overdose. We know \(X_1,\ldots,X_m > 0\)

Example: a picture

Example: following the recipe (part 2)

  1. Make some assumptions linking what you know to the thing you want to learn about

Assume everyone who has had at least one overdose went to the emergency room.

Assume every drug injector has an overdose with constant rate \(\lambda\) per year. (This is a gross simplification of reality!)

  1. Use the model output to learn about the thing you want to learn about

We can use the constant rate assumption to learn about \(\lambda\) from \(X_1,\ldots,X_m\).

(In fact, \(X_i\sim\text{Poisson}(\lambda)\), so we can estimate \(\lambda\) from the data)

Example: following the recipe (part 3)

Then, we know that \[ M \approx \Pr(X_i>0) \times (\text{number at risk}) \] and it turns out that \[ M \approx (1-e^{-\lambda}) N . \]

Rearranging, we have the estimate

\[ \hat{N} = \frac{m}{1-e^{-\hat\lambda}} \]

  1. Evaluate the model and its sensitivity to assumptions (saved for later)

Example: post-mortem

Where was the magic step?

Specifying a common rate of overdose per drug user

This allows estimation of \(\lambda\), and implies a probability distribution for \(M\), which we can use to estimate \(N\).

Some questions:

  • What are the weaknesses of this model?
  • Which assumptions are likely to be violated in practice?
  • Does it matter if we get the constant rate assumption exactly right?
  • If not, are we likely to over- or under-estimate \(N\)?

The big picture

Models: useful and dangerous

Models are useful when the are:

  • intuitive: they formalize hypotheses
  • statistical: they limit free parameters
  • interpretable: parameters have real-world meaning

Models are dangerous when they:

  • limit hypotheses to models that are easy to specify
  • have an inflexible structure that limits fitting
  • do not faithfully represent the mechanism of interest
  • have dynamics that do not generalize to alternative scenarios of interest

Statistical vs mechanistic models

If you have taken a statistics class, you have seen statistical approaches to explaining variation. For example, consider the “statistical regression model” \[ y = \alpha + \beta x + \epsilon \] If we regard \(x\) as a treatment and \(y\) as a health outcome for a given patient, then we would like to think of \(\beta\) as the “effect” of the treatment.

This model posits a linear relationship between treatment and outcome. Given a one-unit change in \(x\), we expect the outcome \(y\) to change by an increment of \(\beta\).

Our philosophy

We think there is no difference between “statistical” and “mechanistic” models, except for the stories we tell about their structure and coefficients. I think:

  • We should strive to interpret statistical models in a mechanistic way, and reject them if they do not help us learn about the mechanism of interest.
  • We should treat mechanistic models as statistical models and fit them to data, whenever possible. When not possible, we should ask what new data we ought to collect, or how to identify only the mechanistic features of interest.

References

Basu, Sanjay, and Jason Andrews. 2013. “Complexity in Mathematical Models of Public Health Policies: A Guide for Consumers of Models.” PLoS Medicine 10 (10). Public Library of Science: e1001540.

Bilcke, Joke, Marina Antillón, Zoë Pieters, Elise Kuylen, Linda Abboud, Kathleen M Neuzil, Andrew J Pollard, A David Paltiel, and Virginia E Pitzer. 2019. “Cost-Effectiveness of Routine and Campaign Use of Typhoid Vi-Conjugate Vaccine in Gavi-Eligible Countries: A Modelling Study.” The Lancet Infectious Diseases. Elsevier.

Crawford, Forrest W, Jiacheng Wu, and Robert Heimer. 2018. “Hidden Population Size Estimation from Respondent-Driven Sampling: A Network Approach.” Journal of the American Statistical Association 113 (522). Taylor & Francis: 755–66.

Frigg, Roman, and Stephan Hartmann. 2012. “Models in Science, Stanford Encyclopedia of Philosophy.” https://plato.stanford.edu/entries/models-science/.

Gonsalves, Gregg S, Edward H Kaplan, and A David Paltiel. 2015. “Reducing Sexual Violence by Increasing the Supply of Toilets in Khayelitsha, South Africa: A Mathematical Model.” PLoS One 10 (4). Public Library of Science: e0122244.

Heesterbeek, Hans, Roy M Anderson, Viggo Andreasen, Shweta Bansal, Daniela De Angelis, Chris Dye, Ken TD Eames, et al. 2015. “Modeling Infectious Disease Dynamics in the Complex Landscape of Global Health.” Science 347 (6227): aaa4339.

Kaplan, Edward H. 1995. “Probability Models of Needle Exchange.” Operations Research 43 (4). INFORMS: 558–69.

Kaplan, Edward H, David L Craft, and Lawrence M Wein. 2002. “Emergency Response to a Smallpox Attack: The Case for Mass Vaccination.” Proceedings of the National Academy of Sciences 99 (16). National Acad Sciences: 10935–40.

Krebs, Emanuel, Benjamin Enns, Linwei Wang, Xiao Zang, Dimitra Panagiotoglou, Carlos Del Rio, Julia Dombrowski, et al. 2019. “Developing a Dynamic Hiv Transmission Model for 6 Us Cities: An Evidence Synthesis.” PloS One 14 (5). Public Library of Science: e0217559.

Otto, Sarah P, and Troy Day. 2011. A Biologist’s Guide to Mathematical Modeling in Ecology and Evolution. Princeton University Press.

Pitzer, Virginia E, Katherine E Atkins, Birgitte Freiesleben de Blasio, Thierry Van Effelterre, Christina J Atchison, John P Harris, Eunha Shim, et al. 2012. “Direct and Indirect Effects of Rotavirus Vaccination: Comparing Predictions from Transmission Dynamic Models.” PloS One 7 (8). Public Library of Science: e42320.

Stokey, Edith, and Richard Zeckhauser. 1978. Primer for Policy Analysis. WW Norton.

Vynnycky, Emilia, and Richard White. 2010. An Introduction to Infectious Disease Modelling. Oxford University Press.

Wein, Lawrence M, David L Craft, and Edward H Kaplan. 2003. “Emergency Response to an Anthrax Attack.” Proceedings of the National Academy of Sciences 100 (7). National Acad Sciences: 4346–51.